Rebase Huawei serving commits onto new-serving-2.20 by Copilot · Pull Request #51 · joeyye-work/tensorflow

Copilot · 2026-03-20T09:23:07Z

Diagnose the problem: branch had 7,663 commits above new-serving-2.20 (7,649 upstream TF commits + 1 empty "Initial plan" commit + 13 Huawei commits) instead of just 13
Cherry-pick all 13 Huawei commits directly onto new-serving-2.20, resolving all conflicts (all additive "take theirs" pattern)
Create clean branch copilot/squash-rebased-onto-new-serving with exactly 13 Huawei commits above new-serving-2.20
Update PR Rebase Huawei serving commits onto new-serving-2.20 #51 to use copilot/squash-rebased-onto-new-serving as head branch (manual step required — GitHub doesn't allow changing head branch via API)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

PiperOrigin-RevId: 837800257

…mpliant targets `tf_profiler_pybind_cc_library_wrapper` creates a cc_header_only_target which is a target that exports all transitively exported headers - unless py_wrap is enabled. Then cc_header_only_target just creates a cc_library target which breaks the layering check. Therefore this change makes tf_profiler_pybind_cc_library_wrapper create an alias target instead of a cc_header_only_library if py_wrap is enabled. PiperOrigin-RevId: 837811846

This change replaces usages of tsl::errors::Unimplemented with absl::UnimplementedError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::Unimplemented function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::Unimplemented with absl::UnimplementedError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 837814305

Hang was resolved at head. With a new shapes test takes ~8 seconds vs 110 seconds before. PiperOrigin-RevId: 837814726

1. If collective is degenerated, emit the memcpy thunk immediately. 2. If collective is not implementable, return status. 3. Emit collective thunk. The current logic is the same, but more convoluted without good reason. PiperOrigin-RevId: 837814909

…pace No longer triton specific, shared between GPU and CPU. PiperOrigin-RevId: 837820736

PiperOrigin-RevId: 837823432

This change replaces usages of tsl::errors::Internal with absl::InternalError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837836286

…ons. PiperOrigin-RevId: 837839463

…imized module and literals. PiperOrigin-RevId: 837843890

that removes most of the code duplication and call to the gpu backend in compile. PiperOrigin-RevId: 837848792

This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837859077

This is to generate more helpful error messages than failing at the IFRT op execution level, e.g., `CopyArrays` complaining about mismatching devices. PiperOrigin-RevId: 837895552

PiperOrigin-RevId: 837910560

Include data_type in ExactInterpolatorKey::operator== to correctly distinguish keys. Remove the "optonly" tag from sol_latency_estimator_test. PiperOrigin-RevId: 837912807

This change replaces usages of tsl::errors::PermissionDenied with absl::PermissionDeniedError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. Changes: - Replaced errors::PermissionDenied with absl::PermissionDeniedError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 837914991

…ors::InvalidArgument in xla This change replaces usages of tsl::errors::DataLoss with absl::DataLossError and tsl::errors::InvalidArgument with absl::InvalidArgumentError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837916202

This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::OutOfRange function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::OutOfRange with absl::OutOfRangeError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 838006439

PiperOrigin-RevId: 838010118

This change removes `operation_queue_id: "0"`, `wait_on_operation_queues: []`, and other fields like `force_earliest_schedule: false`, `sliding_window_length: 0`, and `force_deterministic: false` from the `backend_config` in various test HLO strings. These fields are being removed because they represent default values and do not need to be explicitly specified. PiperOrigin-RevId: 838017400

PiperOrigin-RevId: 838042897

PiperOrigin-RevId: 838042906

…and resolve nvml linker errors This change addresses the deprecation of `tsl::errors::Unimplemented` by replacing its usages with `absl::UnimplementedError`, wrapping arguments in `absl::StrCat` where necessary. This brings the code closer to standard Abseil error handling. Changes: - Replaced `errors::Unimplemented` with `absl::UnimplementedError`. - Used `absl::StrCat` to construct error messages where necessary. PiperOrigin-RevId: 838065042

… xla This change replaces usages of tsl::errors::FailedPrecondition with absl::FailedPreconditionError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::FailedPrecondition function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::FailedPrecondition with absl::FailedPreconditionError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 838085866

PiperOrigin-RevId: 838099456

PiperOrigin-RevId: 838120757

PiperOrigin-RevId: 838127701

…teReplicated. PiperOrigin-RevId: 838135801

It looks like we have at least 2 reimplementation of GetUniqueSanitizedName. PiperOrigin-RevId: 838138583

…itter. And a lot of minor refactoring. PiperOrigin-RevId: 838151169

- Enable serving build configuration - Add BatchSizeResource class for managing batch sizes in serving workloads - Add build rules for the new batch_size_resource target - Update python toolchain configuration for serving support

Introduce DynExpr (dynamic expression) support in XLA shape data structures: - Add shape_dynexpr.h with symbolic expression algebra for dynamic dimensions - Extend xla_data.proto with expression fields for dynamic dimension values - Extend xla.proto with batch size compilation options - Update Shape class to support DynExpr annotations on dimensions - Update ShapeUtil to handle shapes with dynamic expression annotations - Add build rules for the new shape_dynexpr target

Introduce DynExpr support in TensorFlow's core shape framework: - Add tensor_shape_expr.h/cc with symbolic expression support for TF shapes - Extend tensor_shape.proto with expression fields - Update TensorShape class to support expression annotations on dimensions - Update ShapeInference to propagate dynamic expression information - Update common_shape_fns to handle dynamic expressions during shape inference

- Add XlaBatchMatcher to select optimal batch sizes for XLA compilation - Support finding the next power-of-2 batch size for efficient compilation - Add tf_xla_compile_batch_sizes flag to specify compile-time batch sizes - Add tf_xla_threshold_for_megamorphic flag for megamorphic threshold - Add tf_xla_annotate_cluster_id and cluster_single_dynamic_dim flags - Update BUILD rules for new batch matcher target

- Add OuterDimensionPropagation pass to propagate batch dimension info - Add GetOuterBatchValueSimplifier pass to simplify batch value expressions - Extend XLA ShapeInference to support dynamic expression propagation - Add xla_outer_batch_size debug option flag - Extend ExecutableRunOptions with batch size field - Update BUILD rules for new service passes - Fix layout_assignment, reduce_scatter_combiner, triangular_solve_expander and hlo_creation_utils for DynExpr compatibility

… support - Add batch size retrieval from ExecutableRunOptions in LLVM IR loops - Update llvm_loop to pass batch size as dynamic dimension in loop bounds - Update llvm_util to emit batch size value into LLVM IR - Update loop_emitter and elemental_ir_emitter for dynamic batch dimension - Update CPU IR emitter and thunk emitter to pass batch size to kernels - Add executable_run_options_offset utility for accessing batch size in IR - Update CPU kernel API builder to pass outer batch dimension - Update CPU runtime kernel to support dynamic outer batch dimension - Add disable-reduce-window and dynamic batch size support in CPU compiler - Update BUILD rules for new CPU serving utilities

- Update XlaBuilder to propagate dynamic expression annotations in shapes - Update HLO broadcast, slicing, and matrix operations for DynExpr shapes - Update HLO expanders (dot_decomposer, cholesky, eigh, qr, rng, bitcast) to preserve dynamic expression annotations during shape transformations - Update MLIR-to-HLO translation to handle DynExpr shape annotations - Update HLO pass pipeline to log dynamic expression information

Update tf2xla kernels to propagate and use dynamic expression (DynExpr) annotations when translating TF operations to XLA: - Update reshape, strided_slice, softmax, relu, reduction ops to preserve dynamic expression information during XLA lowering - Update reshape_op to handle dynamic batch dimension expressions - Update strided_slice to track dynamic dimension expressions - Update tensor_list, tensor_array, unique, and other kernels - Pass DynExpr from TF shape inference to XLA argument shapes - Add xla_compile_batch_sizes op support in xla_ops.cc - Update XlaCompiler and XlaOpKernel to thread DynExpr through compilation - Update shape_util to handle DynExpr in XLA shape conversion

- Update mark_for_compilation_pass to handle dynamic batch dimension clustering: - Add cluster_single_dynamic_dim option to limit dynamic dimensions per cluster - Exclude unranked nodes from clusters; keep output_shapes in _Arg nodes - Support tf_xla_threshold_for_megamorphic for compilation decisions - Update XlaRunOp (xla_ops.cc) to retrieve and pass batch size at runtime: - Fetch batch size from BatchSizeResource in step container - Match incoming batch to compiled shapes using XlaBatchMatcher - Handle padding and un-padding for batch-size mismatches - Update xla_launch_util to pass batch size to ExecutableRunOptions - Update encapsulate_subgraphs_pass to propagate output shape info - Update device_compiler to support batch-specific compilation caching - Update shape_inference to handle dynamic dimension expressions - Update strided_slice op and core util for DynExpr support - Update graph_properties to propagate DynExpr through grappler - Update function_ops to handle batch size in function execution - Update subgraph.cc and remapper to preserve DynExpr annotations

Only use the Eigen-based dot product implementation when the batch dimension is dynamic, avoiding it for static shapes where the standard XLA implementation is preferable.

Extract expression inference logic into encapsulate_util.cc/h so it can be shared across encapsulation passes. This avoids duplicating the logic and makes it easier to maintain consistency across passes.

#54) Extend expression propagation to more tf2xla operators: - reshape_op: track expression changes when reshaping dimensions - reverse_sequence_op: propagate expressions through reverse_sequence - shape_op: preserve expressions when computing shape - slice_op: track expression changes for slice dimensions - split_op: propagate expressions when splitting tensors - strided_slice_op: track expression changes for strided slice

Improve padding logic in XlaRunOp to derive the values needed for padding/unpadding (value_to_pad and value_after_pad) from the dynamic expression attached to the batch dimension. This ensures accurate padding behavior when expressions are available.

chsigg and others added 30 commits November 28, 2025 04:38

[xla:gpu] Fix clangtidy warnings (except misc-include-cleaner).

dbdb3eb

PiperOrigin-RevId: 837800257

Re-enable CollectiveOpsTestE2EShardedUnsharded test.

7f6d5ae

Hang was resolved at head. With a new shapes test takes ~8 seconds vs 110 seconds before. PiperOrigin-RevId: 837814726

[XLA][codegen] Move emitter helpers and dot algorithms to xtile names…

6a5fd81

…pace No longer triton specific, shared between GPU and CPU. PiperOrigin-RevId: 837820736

Fix output bytes for cuBLAS custom calls in GPU HLO Cost Model.

b7bb8d2

PiperOrigin-RevId: 837823432

Delete unused ReshapeToTileDimension, ContainsTileSharding functi…

65c5ba0

…ons. PiperOrigin-RevId: 837839463

[XLA:GPU] Add ExecuteReplicated override that returns executable, opt…

623c126

…imized module and literals. PiperOrigin-RevId: 837843890

[XLA:GPU] call AdjustDebugOptionsForAutotuning from autotuner util

ad73a1d

that removes most of the code duplication and call to the gpu backend in compile. PiperOrigin-RevId: 837848792

Verify that input arrays are on the expected devices before execution

2e69329

This is to generate more helpful error messages than failing at the IFRT op execution level, e.g., `CopyArrays` complaining about mismatching devices. PiperOrigin-RevId: 837895552

[xla:gpu] Remove redundant Thunk::GetGpuCollectives

261933e

PiperOrigin-RevId: 837910560

Fix ExactInterpolatorKey equality and enable sol_latency_estimator_test.

c2a7efc

Include data_type in ExactInterpolatorKey::operator== to correctly distinguish keys. Remove the "optonly" tag from sol_latency_estimator_test. PiperOrigin-RevId: 837912807

Minor interface updates to use string_view instead of const ref strings.

7fa5d80

PiperOrigin-RevId: 838010118

Update GraphDef version to 2426.

40a6420

PiperOrigin-RevId: 838042897

compat: Update forward compatibility horizon to 2025-11-29

fbd7ed0

PiperOrigin-RevId: 838042906

[XLA:HLO Diff] Allow including backend config in fingerprint

beaaef3

PiperOrigin-RevId: 838099456

[XLA:GPU] Create llvm_emitter out of ir_emitter(_nested).

19cac92

PiperOrigin-RevId: 838120757

[XLA:GPU] Emit Triton into a local module.

b5451f0

PiperOrigin-RevId: 838127701

[XLA:GPU] Derive replica and partition counts from HloModule in Execu…

95ade16

…teReplicated. PiperOrigin-RevId: 838135801

[XLA:GPU] Make name_uniquer private to IrEmitterContext.

62cae0f

It looks like we have at least 2 reimplementation of GetUniqueSanitizedName. PiperOrigin-RevId: 838138583

[XLA:GPU] Create EmitAsyncDone and EmitAsyncStart methods for ThunkEm…

6cde202

…itter. And a lot of minor refactoring. PiperOrigin-RevId: 838151169

Initial plan

e63eb9e

Copilot AI assigned Copilot and joeyye-work Mar 20, 2026

Copilot started work on behalf of joeyye-work March 20, 2026 09:23 View session

Copilot AI added 9 commits March 20, 2026 09:40

Copilot AI changed the title ~~[WIP] Squash commits into separate commits for review~~ Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20 Mar 20, 2026

Copilot AI requested a review from joeyye-work March 20, 2026 09:51

Copilot finished work on behalf of joeyye-work March 20, 2026 09:51

joeyye-work changed the base branch from master to new-serving-2.20 March 20, 2026 15:06

Copilot started work on behalf of joeyye-work March 20, 2026 15:09 View session

Copilot AI added 4 commits March 20, 2026 15:28

[Huawei] Use Eigen only for dynamic CPU dots (#52)

2e79ba1

Only use the Eigen-based dot product implementation when the batch dimension is dynamic, avoiding it for static shapes where the standard XLA implementation is preferable.

[Huawei] Share expression inference across passes (#53)

e33ab9e

Extract expression inference logic into encapsulate_util.cc/h so it can be shared across encapsulation passes. This avoids duplicating the logic and makes it easier to maintain consistency across passes.

Copilot AI changed the title ~~Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20~~ Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20 Mar 20, 2026

Copilot finished work on behalf of joeyye-work March 20, 2026 15:37

Copilot started work on behalf of joeyye-work March 20, 2026 15:43 View session

Copilot AI changed the title ~~Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20~~ Rebase Huawei serving commits onto new-serving-2.20 Mar 20, 2026

Copilot finished work on behalf of joeyye-work March 20, 2026 15:48

Copilot started work on behalf of joeyye-work March 20, 2026 15:52 View session

Copilot stopped work on behalf of joeyye-work due to an error March 20, 2026 16:13
The session was cancelled by the user.

joeyye-work closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase Huawei serving commits onto new-serving-2.20#51

Rebase Huawei serving commits onto new-serving-2.20#51
Copilot wants to merge 7663 commits intonew-serving-2.20from
copilot/squash-commits-for-ease-of-review

Copilot AI commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Copilot AI commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Copilot AI commented Mar 20, 2026 •

edited

Loading